## Loading required package: grid

This document was inspired by this post by Arthur Charpentier.

The repository with the code for creating this post is here. Just some plots so far. We may try something more elaborated (e.g., explicit comparisons of Colombia with other countries) later on.

The data file is available here.

Entropy

One possible measure of inequality is entropy, the classical notion developed by Shannon for Information Theory. This is a generalized version. I am using here the one with \(\alpha=1\).

plot of chunk unnamed-chunk-3

Although entropy is not necessarily a measure of variance, collections of test scores with higher entropy tend to have lower average than those with lower entropy:

plot of chunk unnamed-chunk-4

By the way, why is the entropy of female scores (almost) consistently lower than that of males of the same country?

Standard Deviation

Standard deviation seems to be the most commonly used indicator of performance inequality in standarized test scores. The ranking changes drastically:

plot of chunk unnamed-chunk-5

Distributions of scores (for selected countries)

First, a violin plot of distributions of scores in math (differentiating by sex and ordered by entropy):

plot of chunk unnamed-chunk-7

Empirical cummulative density function for each country in math:

plot of chunk unnamed-chunk-8

And a kernel density estimate (also for math):

plot of chunk unnamed-chunk-9

Quantiles

In his post, Charpentier compares France’s score distribution with other countries by plotting the difference of the quantiles at each level. Here I do the same for Colombia against 17 other countries with the math scores. Just as an illustration I also include Colombia versus Colombia (red is male and blue is female).

plot of chunk unnamed-chunk-10

And let’s do the same with Singapore:

plot of chunk unnamed-chunk-11

Further reading

Serious approaches to this problem: